Word Class Prediction of Ambiguous and Unknown Words of Punjabi Language Using Bi-gram Methods

نویسنده

  • Sanjeev Kumar Sharma
چکیده

Ambiguous and unknown words are found in every language. Ambiguous words are the words having different meaning in different sentences depending upon the context of the sentence. Assigning the correct word class to these ambiguous words is the fundamental task in almost all the NLP applications. A lot of work has been done on this and a lot of work is still to be done. Many statistical and rule based techniques has been applied to assign the correct word class to the word having ambiguous word class. Most commonly used statistical techniques are HMM (Hidden Markov Model), SVM (Support Vector Machine), ME (Maximum Entropy), CRF (Conditional Random Field) and N-gram based techniques. In this research paper a bigram technique has been discussed to assign the correct word class to the ambiguous and unknown words of Punjabi language. A tag set proposed by TDIL has been used to assign the correct word class to the ambiguous and unknown words. Keywords-Ambiguous words, word class, Unknown words, Bi-gram technique, TDIL proposed Punjabi tag set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of part of speech tags for punjabi using support vector machines

Part-Of-Speech (POS) tagging is a task of assigning the appropriate POS or lexical category to each word in a natural language sentence. In this paper, we have worked on automated annotation of POS tags for Punjabi. We have collected a corpus of around 27,000 words, which included the text from various stories, essays, day-to-day conversations, poems etc., and divided these words into different...

متن کامل

Terminology of Combining the Sentences of Farsi Language with the Viterbi Algorithm and BI-GRAM Labeling

This paper, based on the Viterbi algorithm, selects the most likely combination of different wording from a variety of scenarios. In this regard, the Bi-gram and Unigram tags of each word, based on the letters forming the words, as well as the bigram and unigram labels After the breakdown into the composition or moment of transition from the decomposition to the combination obtained from th...

متن کامل

Part of Speech Tagging of Punjabi Language using N Gram Model

POS tagger is the process of assigning a correct tag to each word of the sentence. We attempted to improve the accuracy of existing Punjabi POS tagger. This POS tagger lacks in resolving the ambiguity of a no of words as it uses only hand written Rules. A Bi-gram Model has been used to solve the part of speech tagging problem. An annotated corpus was used for training and estimating of bi gram ...

متن کامل

Word Disambiguation in Shahmukhi to Gurmukhi Transliteration

To write Punjabi language, Punjabi speakers use two different scripts, Perso-Arabic (referred as Shahmukhi) and Gurmukhi. Shahmukhi is used by the people of Western Punjab in Pakistan, whereas Gurmukhi is used by most people of Eastern Punjab in India. The natural written text in Shahmukhi script has missing short vowels and other diacritical marks. Additionally, the presence of ambiguous chara...

متن کامل

English to Punjabi Transliteration using Orthographic and Phonetic Information

Machine transliteration is an emerging and a very important research area in the field of machine translation. While the translation system finds the same meaning word/sentence in another language, the transliteration helps us to pronounce them. This paper describes the process of transliteration from English to Punjabi language using a rule based approach. Both source grapheme and phonetic inf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015